Model-based clustering of mixed data with sparse dependence
نویسندگان
چکیده
Mixed data refers to a mixture of continuous and categorical variables. The clustering problem with mixed is long-standing statistical problem. latent Gaussian model, model-based approach for such problem, has received attention owing its simplicity interpretability. However, these approaches are prone dimensionality problems. Specifically, parameters must be estimated each group, the number covariance quadratic in To address this, we propose “regClustMD,” novel method that can sparse dependence among We consider assuming precision matrix between variables nonzero elements. maximizing penalized complete log-likelihood using Monte Carlo expectation-maximization (MCEM) algorithm. demonstrate our through simulation study real-world examples.
منابع مشابه
Model based clustering for mixed data: clustMD
Amodel based clustering procedure for data of mixed type, clustMD, is developed using a latent variable model. It is proposed that a latent variable, following a mixture of Gaussian distributions, generates the observed data of mixed type. The observed data may be any combination of continuous, binary, ordinal or nominal variables. clustMD employs a parsimonious covariance structure for the lat...
متن کاملModel-based Co-clustering for High Dimensional Sparse Data
We propose a novel model based on the von Mises-Fisher (vMF) distribution for coclustering high dimensional sparse matrices. While existing vMF-based models are only suitable for clustering along one dimension, our model acts simultaneously on both dimensions of a data matrix. Thereby it has the advantage of exploiting the inherent duality between rows and columns. Setting our model under the m...
متن کاملModel-based clustering of Gaussian copulas for mixed data
Clustering task of mixed data is a challenging problem. In a probabilistic framework, the main difficulty is due to a shortage of conventional distributions for such data. In this paper, we propose to achieve the mixed data clustering with a Gaussian copula mixture model, since copulas, and in particular the Gaussian ones, are powerful tools for easily modelling the distribution of multivariate...
متن کاملClustering of Conceptual Graphs with Sparse Data
This paper gives a theoretical framework for clustering a set of conceptual graphs characterized by sparse descriptions. The formed clusters are named in an intelligible manner through the concept of stereotype, based on the notion of default generalization. The cognitive model we propose relies on sets of stereotypes and makes it possible to save data in a structured memory.
متن کاملMixture model clustering for mixed data with missing information
One di-culty with classi.cation studies is unobserved or missing observations that often occur in multivariate datasets. The mixture likelihood approach to clustering has been well developed and is much used, particularly for mixtures where the component distributions are multivariate normal. It is shown that this approach can be extended to analyse data with mixed categorical and continuous at...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2023
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2023.3296790